perm filename ABSNTX[BOO,JMC] blob sn#534906 filedate 1980-09-09 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00011 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	.if FALSE then begin "hide outline"
C00004 00003	.s(ABSNTX,ABSTRACT SYNTAX)
C00005 00004	.ss(abstract,What is Abstract Syntax?)
C00024 00005	.ss(mcpain,Correctness of a Compiler for Arithmetic Expressions.)
C00028 00006	.ss(sourcelang,The source language. )
C00031 00007	.ss(objectlang,The object language.)
C00038 00008	.ss(compiler,The compiler.)
C00043 00009	.ss(proof,Proof of Correctness Theorem.)
C00050 00010	.ss(remarks,Remarks.)
C00055 00011	.ss(absntxex, Some substantial exercises.)
C00059 ENDMK
C⊗;
.if FALSE then begin "hide outline"

not clear that this chapter belongs in book.
unless as part of chapter on inductionively defined data structures
	as technology for programming and proving

Additional exercises 
    ??
It might be worthwhile to give an example or an exercise involving a
concrete syntax of strings where the basic operation are character
operations.  For example, the student could write programs to test
whether an expression is a sum by finding the principal connective
and correctly skipping over parentheses.  He should also prove that
the abstract relations between the synthetic and analytic syntaxes
are satisfied.  Is this all that has to be proved?

[For later addition:  absyn of WFFs, TERMs, ...
	or λ-expressions and operations of λ-calc...
	or see LSPCMT:General Comments on Content for more ideas
]


.end "hide outline"
.s(ABSNTX,ABSTRACT SYNTAX)
.if NOTASSEMBLING then start
.COMPIL: NEXT SECTION!;
.COMPUT: NEXT SECTION!;
.end
	
.ss(abstract,What is Abstract Syntax?)

	When we compute with symbolic expressions, whether they be
computer programs, algebraic expressions, or formulas of logic, some
aspects of their representation are irrelevant for many purposes.  For
example, the sum of ⊗a and ⊗b may be written ⊗⊗a_+_b⊗ as in the most
common algebraic notation, $$(PLUS ⊗a ⊗⊗b⊗)$ as in LISP, ⊗⊗+ab⊗ in "Polish
notation" or ⊗⊗ab+⊗ in the reverse Polish notation made popular by the
Hewlett-Packard calculators.  Mathematical logicians use representations of
expressions by "qGoedel numbers" in order to represent symbolic
computations as numerical computations.   One such qGoedel numbering
represents the sum of ⊗a and ⊗b as 2%5a%*3%5b%*.  Of course, "the sum of ⊗a 
and ⊗⊗b⊗" is just another such notation which we have been using to talk
about the others.

	For many aspects of evaluating sums or
compiling them, the details of the representation are irrelevant.
Our programs must determine whether an expression is a sum, if so what the
summands are, and it may need to make sums from constituent
expressions.
The predicate that tells whether an expression is a sum and the functions
that extract the summands comprise the %2analytic syntax%1 of sums.
The function that makes a sum expression out of the summand expression
gives the %2synthetic syntax%1 of sums.  The analytic and synthetic
syntaxes are, of course, related to each other.

	⊗⊗Abstract syntax⊗, first described in [McCarthy 1962b],
is a way of presenting only the information about a class of
expressions that is relevant to computing with them, omitting
irrelevant detail.
Later we shall prove a small compiler correct without ever deciding
on a definite notation.  If we decide on a notation for some expressions
and write programs to test whether an expression
is of a given type, to extract subexpressions and to construct new 
expressions from old ones then we have give a ⊗⊗concrete_syntax⊗.
Any concrete syntax  corresponding to our abstract syntax will do.

.item ←0
.bb |Example #. Simplification of arithmetic expressions.|

	Consider arithmetic expressions built up from constants and variables
by forming sums and products.   The abstract syntax is given
in table_{table expsyn}.
.begin nofill select 2 group turnon "∂"
.n←12

%3∂(n)predicate∂(n+16)analytic operation∂(n+40)synthetic operation%*

∂(n)isconst(e)
∂(n)isvar(e)

∂(n)issum(e)∂(n+18)s1(e),   s2(e)∂(n+44)mksum(e1,e2)
∂(n)isprod(e)∂(n+18)p1(e),   p2(e)∂(n+44)mkprod(e1,e2)
.end
!tableexpsyn Abstract Syntax of Arithmetic Expressions.!

	The %3predicate%* column asserts that the expressions comprise 
constants, variables and binary sums and procucts, and that the predicates 
⊗isconst, ⊗isvar, ⊗issum, and ⊗isprod enable one to classify each expression.
The %3analytic operations%* column tells us that variables and constants
are "atomic" expressions, while sums and products have subexpressions obtained
by applying ⊗s1 and ⊗s2 in the case of sums, or ⊗p1 and ⊗p2 in the case of
products.

	The predicates and analytic operations comprise the 
⊗⊗abstract analytic syntax⊗ of these expressions, e.g. they allow us to analyse
the expressions.  The analytic syntax provides sufficient information to allow
us to write an interpreter (evaluator) thus defining the semantics of our 
expression language. Thus we have
.begin nofill group turnon "∂"
.n←20

∂(n)⊗⊗value(e,qx) ← ⊗
∂(n+4)⊗⊗qif isconst(e) qthen val(e) ⊗
!fcnvalue&  ∂(n+4)⊗⊗qelse qif isvar(e) qthen c(e,qx)⊗ 
∂(n+4)⊗⊗qelse qif issum(e) qthen value(s1(e),qx) + value(s2(e),qx)⊗ 
∂(n+4)⊗⊗qelse qif isprod(e) qthen value(p1(e),qx) q* value(p2(e),qx)⊗ 
.end

where ⊗val(e) gives the numerical value of an expression ⊗e representing
a constant, ⊗⊗c(e,qx)⊗ gives the value of the variable ⊗e in the state
vector qx and "+" and "q*" are some binary operations.  
(It is natural to regard "+" and "q*" as an operations that resembles 
addition and multiplication of real numbers, but our we shall need only a
few properties of these operations.)

	The %3synthetic operation%*s column of table_{table expsyn} 
says that sums can be constructed from a pair of expressions using the
operation ⊗mksum and similarly products can be constructed from a pair
of expression using ⊗mkprod.  These operations form the 
⊗⊗abstract synthetic syntax⊗ of expressions, e.g. they allow us to
construct new expressions from old ones.  

	Using the abstract syntax functions and predicates we can 
write a function that simplifies an expression by 
.begin nofill   indent 12

##. removing 0's from sums,
##. removing 1's from products,
##. replacing a product containing 0 as a factor by 0.
.end

We will assume the constants 0 and 1 are represented by "0" and "1",
for simplicity.  Then ⊗simplify is given by
.begin nofill turnon "∂" group turnoff "{}"
.n←16

∂(n)⊗⊗simplify(e) ← ⊗
∂(n)⊗⊗    qif isvar(e) ∨ isconst(e) qthen e⊗
∂(n)⊗⊗    qelse qif issum(e) qthen ⊗
∂(n)⊗⊗          {simplify(s1(e)),simplify(s2(e))} ⊗
∂(n)⊗⊗          [λe1 e2. qif e1=0 qthen e2 ⊗
∂(n)⊗⊗                   qelse qif e2=0 qthen e1 ⊗
!fcnsimplify& ∂(n)⊗⊗                   qelse mksum(e1,e2)] ⊗
∂(n)⊗⊗    qelse qif isprod(e) qthen ⊗
∂(n)⊗⊗          {simplify(p1(e)),simplify(p2(e))} ⊗
∂(n)⊗⊗          [λe1 e2. qif e1=0 ∨ e2=0 qthen 0 ⊗
∂(n)⊗⊗                   qelse qif e1=1 qthen e2 ⊗
∂(n)⊗⊗                   qelse qif e2=1 qthen e1 ⊗
∂(n)⊗⊗                   qelse mkprod(e1,e2)] ⊗
.end


	Suppose we would like to prove something about our functions
on expressions.  For example we would like to know that the simplifier will
indeed terminate (e.g. there are no infinite expressions).  Then we need
 and a way of expressing the fact that the only expressions allowed
are those that can be constructed from variables and constants by a finite
number of applications of the sum and product constructions.  
The following schema is an induction principle for expressions essentially
expressing this fact.

.begin indent 4,4
    Suppose qF is a predicate applicable to expressions, and suppose that
for all expressions ⊗e, we have
.end
.begin nofill select 2 turnon "∂" group
.n←24

∂(n)isconst(e) ⊃ qF(e)  %1and%*
!eqnfullindhyp ∂(n)isvar(e) ⊃ qF(e)  %1and%*
∂(n) issum(e) ∧ qF(s1(e)) ∧ qF(s2(e)) ⊃ qF(e)   %1and%*
∂(n)isprod(e) ∧ qF(p1(e)) ∧ qF(p2(e)) ⊃ qF(e).
.end

    Then we may conclude that qF(e) is true for all expressions ⊗e.  

Using this principle we can show that the simplifier terminates by the
techniques of Chapter {SECTION PROVIN}.

	Now suppose we would like to show that the simplifier 
preserves the value of expressions.  Then we will need some facts about
the relations satisfied by the analytic and synthetic components of the
syntax.  Thus for expressions ⊗e, ⊗e1, ⊗e2 we have
.begin nofill turnon "∂"
.n←24

∂(n)⊗⊗issum(mksum(e1,e2)) ⊗
∂(n)⊗⊗s1(mksum(e1,e2))=e1 ⊗
!eqnsumrel ∂(n)⊗⊗s2(mksum(e1,e2))=e2 ⊗
∂(n)⊗⊗issum(e) ⊃ e=mksum(s1(e),s2(e)) ⊗

∂(n)⊗⊗isprod(mkprod(e1,e2)) ⊗
∂(n)⊗⊗p1(mkprod(e1,e2))=e1 ⊗
!eqnprodrel ∂(n)⊗⊗p2(mkprod(e1,e2))=e2 ⊗
∂(n)⊗⊗isprod(e) ⊃ e=mkprod(p1(e),p2(e)) ⊗
.end

Using these relations, the principle of induction, and some simple 
properties of the + and q* operations involving 0 and 1 we can show
that 

!eqnpreserve ⊗⊗⊗value(e,qx)=value(simplify(e),qx)⊗

far all expressions ⊗e and state vectors qx.

.bb |Example #. Abstract and Concrete Syntax of LISP terms.|

	Now we will give the abstract and concrete
analytic syntax of pure (functional) LISP terms.  We will see in the case
of LISP the concrete syntax is very close to the abstract syntax.  This
is one reason that it is so easy to write LISP programs to manipulate LISP
programs.  We will also introduce in this example additional techniques which
are useful for defining abstract syntax.

	A pure LISP term is essentially any term that can occur at the right hand
side of a defining equation as described in Chapter {SECTION READIN} (excluding
labels).
Thus LISP terms are either variables, constants, conditional terms, list terms, 
or application terms, denoted by ⊗lvar, ⊗lconst, ⊗lcond, ⊗llist and ⊗lappl 
respectively.  

	We will use some further classes of expressions which are not LISP 
terms but are useful in analyzing the composite terms.  These are the
classes of function expressions: basic functions (⊗lbfun), 
defined functions (⊗ldfun), and lambda expressions (⊗llamb)  
that occur in application terms.  

	Finally some of the components of the expressions will be lists
of expressions.  We will use ⊗car, ⊗cdr and ⊗null to analyse the lists of
expressions.  It should be emphasized that this use is in an abstract sense
and does not depend on any properties of a particular representation of
lists.  There will be three special kinds of lists: the list of pairs of
terms in a conditional term (⊗lpairlist), the list of arguments in an
application term (⊗ltermlist), and the list of variables in a lambda expression
(⊗lvarlist).  

	The abstract analytic syntax of LISP terms is given in 
table_{table lispsyn}.

.begin nofill turnon "∂" select 2 group
.n←20  m←40

∂(n)%3predicate∂(m)associated functions%*

∂(n)lconst(e)
∂(n)lvar(e)
∂(n)lbfun(e)
∂(n)ldfun(e)
∂(n)null(e)

∂(n)lcond(e)∂(m)pairl(e)
∂(n)llist(e)∂(m)terml(e)
∂(n)lappl(e)∂(m)funx(e),  argl(e)

∂(n)llamb(e)∂(m)varl(e),  body(e)
∂(n)lpairlist(e)∂(m)ifpart(e), thenpart(e), elsepart(e)
∂(n)ltermlist∂(m)car,   cdr
∂(n)lvarlist∂(m)car,   cdr
.end
!tablelispsyn LISP Absrtact Syntax.!

	In the usual S-expression representation of LISP terms 
(internal form) we would have for the basic predicates
the following definitions
.begin nofill turnon "∂" group
.n←12

∂(n)⊗⊗lconst x ← [qat x ∧ [numberp x ∨ x=qT ∨ x=qNIL]] ∨ qa x=$QUOTE ⊗
∂(n)⊗⊗lvar x ← qat x ∧ ¬[numberp x ∨ x=qT ∨ x=qNIL] ⊗
∂(n)⊗⊗lbfun(e) ← [e=$CAR ∨ e=$CDR ∨ e=$CONS ∨ e=$ATOM ∨ e=$$EQ$]⊗
∂(n)⊗⊗ldfun(e) ← qat e ∧ ¬[numberp(e) ∨ e=qT ∨ e=qNIL ∨ lbfun(e)] ⊗
∂(n)⊗⊗null(e) ← qn e ⊗
.end


The concrete syntax for conditional terms and the related auxiliary expressions 
is given by
.begin nofill turnon "∂"
.n←12

∂(n)⊗⊗lcond(e) ← qa e=$COND ∧ lpairlist(pairl(e))⊗
∂(n)⊗⊗pairl(e) ← qd e ⊗
∂(n)⊗⊗lpairlist(e) ← qn e ∨ [lterm(ifpart(e)) ∧ lterm(thenpart(e) ∧ lpairlist(elsepart(e))]⊗
∂(n)⊗⊗ifpart(e) ← qaa e⊗
∂(n)⊗⊗thenpart(e) ← qada e⊗
∂(n)⊗⊗elsepart(e) ← qd e⊗
.end

For list terms we have
.begin nofill turnon"∂"
.n←12

∂(n)⊗⊗llist(e) ← qa e=$LIST ∧ ltermlist(terml(e))⊗
∂(n)⊗⊗terml(e) ← qd e ⊗
∂(n)⊗⊗ltermlist(e) ← qn e ∨ [lterm(car(e)) ∧ ltermlist(cdr(e))]⊗
.end

The concrete syntax for application terms and lambda expression is the following 
.begin nofill turnon"∂"
.n←12

∂(n)⊗⊗lappl(e) ← [lbfun(funx(e)) ∨ ldfun(funx(e)) ∨ llamb(funx(e))] ∧ ltermlist(argl(e))⊗
∂(n)⊗⊗funx(e) ← qa e⊗
∂(n)⊗⊗argl(e) ← qd e⊗
∂(n)⊗⊗llamb(e) ← qa e=$LAMBDA ∧ lvarlist(varl(e)) ∧ lterm(body(e))⊗
∂(n)⊗⊗varl(e) ← qad e⊗
∂(n)⊗⊗body(e) ← qadd e⊗
∂(n)⊗⊗lvarlist(e)← qn e ∨ [lvar(car(e)) ∧ lvarlist(cdr(e))]⊗
.end

.skip 2
.item←0
.cb Exercises

#. Prove the statements made about the expression simplifier in example 1.
That is, prove it terminates by defining a predicate ⊗isexp such that 

⊗⊗⊗isexp(e) ≡ isvar(e) ∨ isconst(e) ∨ issum(e) ∨ isprod(e)⊗

and showing that ⊗isexp(simplify(e)) for all expressions ⊗e.  
Then show that the statement {eqn preserve} hold for all expressions.

#.  Work out the synthetic syntax for LISP terms and state relations
for each construct analagous to those given for the expressions in 
example 1.  State an induction principle for LISP terms.
.ss(mcpain,Correctness of a Compiler for Arithmetic Expressions.)

	In the following sections we prove the correctness of a simple
compiling algorithm for compiling arithmetic expressions into machine language.
The presentation is adapted from that of
McCarthy and Painter[1967].  

	We begin with a brief outline of the methods used to formalize
the problem.  The analytic abstract syntax is given for both the
arithmetic expressions (source language) and machine language programs
(object language).  An interpreter defined in terms of the analytic syntax
is given for each language.  
The two interpreters are written in the same language,
and thus define the semantics of the two languages in common terms.
The interpreters each make use of a ⊗state ⊗vector to represent the values
of variables (in the expression language case) or to store contents 
of registers (in the machine language case).   The synthetic syntax 
is also given for the machine language and the compiler is defined
in terms of the analytic syntax of the source language and the synthetic
syntax of the object language.  Thus the compiler breaks down terms in the
source language and rebuilds them in the object language.
To show that the compiler is correct we must show that
for an expression ⊗e the interpretation of ⊗e  by the expression 
interpreter gives the same answer as the execution of the compiled code
for ⊗e by the machine language interpreter, assuming that they start out
with the same values for all of the variables in ⊗e.  


	The expressions of our source language are a subclass of those
described in qsect{subsection abstract}.
The computer language into which these expressions are compiled is for 
a single address computer with an accumulator, called ⊗ac, and four instructions:
⊗li (load immediate), ⊗load, ⊗sto (store) and ⊗add.  (Our simple language
does not require the use of jump instructions.)
The compiler produces code that computes the value of the expression being
compiled and leaves this value in the accumulator.  The above expression
is compiled into code which in assembly language might look as follows:
.skip
.begin nofill turnon "∂" select 2
.n←35

∂(n)load∂(n+6)x
∂(n)sto∂(n+6)t
∂(n)li∂(n+6)3
∂(n)add∂(n+6)t
∂(n)sto∂(n+6)t
∂(n)load∂(n+6)x
∂(n)sto∂(n+6)t + 1
∂(n)load∂(n+6)y
∂(n)sto∂(n+6)t + 2
∂(n)li∂(n+6)2
∂(n)add∂(n+6)t + 2
∂(n)add∂(n+6)t + 1
∂(n)add∂(n+6)t

.end

Again because we are using abstract syntax there is no commitment to a
precise form for the object code.


.ss(sourcelang,The source language. )

	As mentioned above the source language is a subset of the expression
language described in qsect{subsection abstract}.  In particular, we will consider expressions
that are variables, constants or sums.  Products are omitted as they introduce
nothing new into the problem and serve only to make everything longer.
Thus we will use the relevant parts of table {table expsyn} for the abstract
analytic syntax of source expressions.  
The semantics is given by the interpreter ⊗value {eqn value}
by simply ignoring the ⊗isprod clause.

	For proving the correctness of compilers,
we don't need a synthetic syntax for
source language expressions since the interpreter and compiler
use only the analytic syntax.  The induction principle for this simpler
class of expressions  is obtained by deleting the ⊗isprod clause from
the induction hypothesis {eqn fullindhyp}.  Thus

.begin indent 4,4
    If qF is a predicate applicable to expressions, and if for all 
expressions ⊗e, we have
.begin nofill select 2 turnon "∂" group
.n←24

∂(n)isconst(e) ⊃ qF(e)  %1and%*
!eqnexpindhyp ∂(n)isvar(e) ⊃ qF(e)  %1and%*
∂(n)issum(e) ∧ qF(s1(e)) ∧ qF(s2(e)) ⊃ qF(e).
.end

Then we may conclude that qF(e) is true for all expressions ⊗e.  
.end
.ss(objectlang,The object language.)

	We must give both the analytic and synthetic syntaxes for the 
object language because the interpreter defining its semantics uses the 
analytic syntax and the compiler uses the synthetic syntax.  The analytic and
synthetic syntaxes for instructions are given in the table_{table inssyn}.
.begin nofill select 2 group turnon "∂"
.n←8

∂(n)%3operation∂(n+14)predicate∂(n+28)analytic operation∂(n+50)synthetic operation

∂(n+1)li∂(n+6)qα∂(n+14)isli(s)∂(n+32)arg(s)∂(n+54)mkli(qα)
∂(n+1)load∂(n+6)x∂(n+14)isload(s)∂(n+32)adr(s)∂(n+54)mkload(x)
∂(n+1)sto∂(n+6)x∂(n+14)issto(s)∂(n+32)adr(s)∂(n+54)mksto(x)
∂(n+1)add∂(n+6)x∂(n+14)isadd(s)∂(n+32)adr(s)∂(n+54)mkadd(x)
.end
!tableinssyn Abstract Syntax of Machine Instructions.!

	A program is a list of instructions and ⊗null(p) asserts that ⊗p is the 
null list.  If the program ⊗p is not null then ⊗first(p) gives the first 
instruction and ⊗rest(p) gives the list of remaining instructions.  We shall use 
the operation ⊗p1*p2 to denote the program obtained by appending ⊗p2 onto the
end of ⊗p1.  Since we have only one level of list we can identify a single
instruction with a program that has just one instruction.

	The synthetic and analytic syntaxes of instructions are related by the 
following.
.begin nofill  select 2 turnon "∂" group
.n←20

∂(n)isli(mkli(qα))
∂(n)arg(mkli(qα)) = qα
!eqnisli ∂(n)isli(s) ⊃ s = mkli(arg(s))
∂(n)null(rest(mkli(qα)))
∂(n)isli(s) ⊃ first(s) = s
.apart
.group

∂(n)isload(mkload(x))
∂(n)x = adr(mkload(x))
!eqnisload ∂(n)isload(x) ⊃ x = mkload(adr(x))
∂(n)null(rest(mkload(x)))
∂(n)isload(s) ⊃ first(s) = s
.apart
.group

∂(n)issto(mksto(x))
∂(n)x = adr(mksto(x))
!eqnissto ∂(n)issto(s) ⊃ s = mksto(adr(s))
∂(n)null(rest(mksto(x)))
∂(n)issto(s) ⊃ first(s) = s
.apart
.group

∂(n)isadd(mkadd(x))
∂(n)x = adr(mkadd(x))
!eqnisadd ∂(n)isadd(s) ⊃ s = mkadd(adr(s))
∂(n)null(rest(mkadd(x)))
∂(n)isadd(s) ⊃ first(s) = s
.apart
.group

∂(n)¬null(p) ⊃ p = first(p) * rest(p),
!eqnstring ∂(n)¬null(p1) ∧ null(rest(p1)) ⊃ p1 = first(p1*p2)
∂(n)null(p1*p2) ≡ null(p1) ∧ null(p2).
.end

The * operation is associative. (The somewhat awkward form of these relations
comes from having a general concatenation operation rather than just an
operation that prefixes a single instruction onto a program.)

	A state vector for a machine gives, for each register in the machine, its
contents.  We include the accumulator denoted by ⊗ac as a register. 
Associatedwith state vectors are the functions ⊗a (for assign) and 
⊗c (for contents).   Thus

	1.  ⊗⊗c(x,qh)⊗ denotes the value of the contents of register ⊗x in machine
state qh.

	2.  ⊗⊗a(x,qα,qh)⊗ denotes the state vector that is obtained from the state
vector qh by changing the contents of register ⊗x to qα leaving the other
registers unaffected.

The relations satisfied by these functions are:
.begin nofill  group turnon "∂"
.n←20

         ∂(n)⊗⊗     c(x,a(y,qα,qh)) = qif x = y qthen qα qelse c(x,qh),⊗
!eqnvect ∂(n)⊗⊗a(x,qα,a(y,qα,qh)) = qif x = y qthen a(x,qα,qh) qelse a(y,qα,a(x,qα,qh)),⊗ 
         ∂(n)⊗⊗      a(x,c(x,qh),qh) = qh.⊗
.end

[We note that the first argument to the ⊗c and ⊗a functions is a variable
in the case of the expression state vector while in the case of the machine
state vector it is a register.  This causes no difficulty, the functions
satisfy the same relations in both cases and can just be thought as
polymorphic.]

	Now we can define the semantics of the object language by
.begin nofill select 2 group turnon "∂"

∂(n)step(x,qh) ←
∂(n+4)qif isli(s) qthen a(ac,arg(s),qh) 
!fcnstep&  ∂(n+4)qelse qif isload(s) qthen a(ac,c(adr(s),qh),qh) 
∂(n+4)qelse qif issto(s) qthen a(adr(s),c(ac,qh),qh) 
∂(n+4)qelse qif isadd(s) qthen a(ac,c(adr(s),qh) + c(ac,qh),qh)
.end

which gives the state vector that results from executing an instruction and

!fcnoutcome&  ⊗⊗⊗outcome(p,qh) ← qif null(p) qthen qh qelse outcome(rest(p),step(first(p),qh))⊗ 

which gives the state vector that results from executing the program ⊗p 
with state vector qh.

	The following lemma is left as an exercise for the reader.

!eqnoutlem ⊗⊗⊗outcome(p1*p2,qh) = outcome(p2,outcome(p1,qh))⊗


.ss(compiler,The compiler.)

	In order to compile an expression we need to know where the
values of the variables occuring in the expression will be stored.
We shall assume that the function ⊗loc maps each variable to the register
where it will initially be stored in the machine state vector.
If the comparison of the interpretation of an expression and the
execution of its compiled code is to be meaningful, the initial
state vectors must agree on the variables occuring in the expression.  Thus
we will assume that the relation

!eqnloc ⊗⊗⊗c(loc(v),qh) = c(v,qx)⊗

between the state vector qh before the compiled program
starts to act and the state vector qx of the source program holds.

	Now we can write the compiler.  It is
.begin nofill select 2 turnon "∂" group
.n←8

∂(n)compile(e,t) ←
∂(n+4)qif isconst(e) qthen mkli(val(e))
!fcncompile&  ∂(n+4)qelse qif isvar(e) qthen mkload(loc(e))
∂(n+4)qelse qif issum(e) qthen compile(s1(e),t) * mksto(t) * compile(s2(e),t + 1) * mkadd(t)

.end
Here ⊗t is the number of a register such that all variables are stored in 
registers numbered less than ⊗t, so that registers ⊗t and above are available
for temporary storage.

	Before we can state our definition of correctness of the compiler, 
we need a notion of partial equality for state vectors

⊗⊗⊗qzq1 =%4A%* qzq2⊗

where qzq1 and qzq2 are state vectors and $A is a set of variables means that
corresponding components of qzq1 and qzq2 are equal except possibly for values,
of variables in $A.  Symbolically, ⊗⊗x_%8π%*_$$A$_⊃_c(x,qzq1)=c(x,qzq2)⊗.  Partial
equality satisfies the following relations:
.xgenlines←xgenlines-2
.begin nofill turnon "∂" turnoff "{}" group 
.n←16

!eqnpeq3 ∂(n)qzq1 = qzq2 is equivalent to qzq1 =%4{ }%* qzq2 where { } denotes the empty set,
!eqnpeq4 ∂(n)if $A ⊂ $B and qzq1 =%4A%* qzq2 then qzq1 =%4B%* qzq2,
!eqnpeq5 ∂(n)if qzq1 =%4A%* qzq2, then ⊗⊗a(x,qα,qzq1) =%4A-{x}%* a(x,qα,qzq2)⊗, 
!eqnpeq6 ∂(n)if ⊗x ε $A then ⊗⊗a(x,qα,qz) =%4A%* qz⊗,  
!eqnpeq7 ∂(n)if qzq1 =%4A%* qzq2 and qzq2 =%4B%* qzq3, then qzq1 =%4A∪B%* qzq3.

.end

In our case we need a specialization of this notation and will use
.xgenlines←xgenlines-1
.begin nofill group turnon "∂" turnoff "{}"
.n←16

∂(n)qzq1 =%4t%* qzq2 to denote qzq1 =%4{x|x≥t}%* qzq2
∂(n)qzq1 =%4ac%* qzq2 to denote qzq1 =%4{ac}%* qzq2   and
∂(n)qzq1 =%4t,ac%* qzq2 to denote aqzq1 =%4{x|x = ac ∨ x ≥ t}%* qzq2. 
.end

	The correctness of the compiler is stated in

THEOREM 1.  If qh and qx are machine and source language state vectors
respectively such that {eqn loc} holds then

⊗⊗⊗outcome(compile(e,t),qh) =%4t%* a(ac,value(e,qx),qh)⊗

It states that the result of running the compiled program is to put the
value of the expression compiled into the accumulator.  No registers
except the accumulator and those with addresses ≥ ⊗t are affected.
.ss(proof,Proof of Correctness Theorem.)

	The proof is accomplished by an induction on the expression ⊗e being 
compiled using the principle given in qsect{subsection sourcelang}.
We prove it first for constants, then for variables, and then for sums on the
induction hypothesis that it is true for the summands.  Thus there are three
cases.
.begin nofill select 2 turnon "∂"
.i←4; n←25; m←62;

∂(18)%3Statement ∂(60)Justification%*

%1I.%*  isconst(e). 

∂(i)outcome(compile(e,t),qh)∂(n)= outcome(mkli(val(e)),qh)∂(m){eqn compile!}
∂(n)= step(mkli(val(e)),qh)∂(m){eqn outcome!}, {eqn isli!}
∂(n)= a(ac,arg(mkli(val(e))),qh)∂(m){eqn isli!}, {eqn step!}
∂(n)= a(ac,val(e),qh)∂(m){eqn isli!}
∂(n)= a(ac,value(e,qx),qh)∂(m){eqn value!}
∂(n)=%4t%2 a(ac,value(e,qx),qh).∂(m){eqn peq3!}, {eqn peq4!}

.xgenlines←xgenlines-2

%1II.%*  isvar(e).  

∂(i)outcome(compile(e,t),qh)∂(n)= outcome(mkload(loc(e)),qh)∂(m){eqn compile!}
∂(n)= a(ac,c(adr(mkload(loc(e))),qh,qh)∂(m){eqn outcome!}, {eqn isload!}, {eqn step!}
∂(n)= a(ac,c(loc(e),qh),qh)∂(m){eqn isload!}
∂(n)= a(ac,c(e,qx),qh)∂(m){eqn loc!}
∂(n)= a(ac,value(e,qx),qh)∂(m){eqn value!}
∂(n)=%4t%2 a(ac,value(e,qx),qh).∂(m){eqn peq3!}, {eqn peq4!}


.xgenlines←xgenlines-1
%1III.%*  issum(e).

∂(i)outcome(compile(e,t),qh)∂(n)= outcome(compile(s1(e),t) * mksto(t)∂(m){eqn compile!}
∂(n+4)* compile(s2(e),t + 1) * mkadd(t),qh)
!eqncaseiii1 ∂(n)= outcome(mkadd(t), ∂(m){eqn outlem!}
∂(n+4)outcome(compile(s2(e),t + 1),
∂(n+8)outcome(mksto(t),
∂(n+12)outcome(compile(s1(e),t),qh)))
.end

using the relation between concatenating programs and composing the
functions they represent.  Now we introduce some notation.  Let
.begin nofill select 2 group turnon "∂"
.n←25

∂(n-2)v ∂(n)= value(e,qx),
!eqnvdef ∂(n-2)vq1 ∂(n)= value(s1(e),qx),
∂(n-2)vq2 ∂(n)= value(s2(e),qx),
.end
.xgenlines←xgenlines-1

so that ⊗⊗v = vq1 + vq2⊗.  Further let
.begin nofill select 2 turnon "∂" group
.n←25

              ∂(n-2)qzq1 ∂(n)= outcome(compile(s1(e),t),qh),
!eqnstatedef  ∂(n-2)qzq2 ∂(n)= outcome(mksto(t),qzq1),
              ∂(n-2)qzq3 ∂(n)= outcome(compile(s2(e),t + 1),qzq2),
              ∂(n-2)qzq4 ∂(n)= outcome(mkadd(t),qzq3)
.end

.xgenlines←xgenlines-1
so that by {eqn caseiii1} ⊗⊗qzq4 = outcome(compile(e,t),qh)⊗, and the 
statement to be proved becomes
.begin nofill select 2 turnon "∂" 
.n←25

∂(n-2)qzq4 ∂(n)=%4t%* a(ac,v,qh).
.end
In order to apply the induction hypothesis to show that ⊗s2(e) compiles correctly
we need to know that for each variable ⊗v the following equation holds:
.begin nofill select 2 turnon "∂" 
.n←25

!eqnloc2 ∂(n-10)c(loc(v),qzq2) ∂(n)= c(v,qx) 
.end

This is proved as follows:
.begin nofill select 2 turnon "∂{}"
.n←25; m←62;

∂(n-10)c(loc(v),qzq2)∂(n)= c(loc(v),a(t,vq1,qh)) ∂(m)%1since%* loc(v) < t
∂(n)= c(loc(v),qh) ∂(m)%1for the same reason%*
∂(n)= c(v,qx) ∂(m) {eqn loc!}
.end
.xgenlines←xgenlines-1
thus the induction hypothesis together with {eqn loc} and {eqn loc2} 
and the definitions {eqn statedef} give 
.begin nofill select 2 turnon "∂" group
.n←25

!eqninduct1 ∂(n-2)qzq1 ∂(n)= %4t%* a(ac,vq1,qh)
!eqninduct2 ∂(n-2)qzq3 ∂(n)= %4t+1%* a(ac,vq2,qzq2)
.end
.xgenlines←xgenlines-2


Now we resume the main proof
.begin nofill select 2 turnon "∂{}"
.n←25; m←62;

!eqncaseiii2 ∂(n-7)c(ac,qzq1) ∂(n)= vq1 ∂(m){eqn vect!}, {eqn induct1!}

∂(n-2)qzq2∂(n)= outcome(mksto(t),qzq1) ∂(m) {eqn statedef!}
∂(n)= a(t,c(ac,qzq1),qzq1)∂(m){eqn outcome!}, {eqn issto!}, {eqn step!}
∂(n)= a(t,vq1,qzq1)∂(m){eqn caseiii2!}
∂(n)=%4t+1%* a(t,vq1,a(ac,vq1,qh))∂(m){eqn peq5!}, {eqn induct1!}
!eqncaseiii3 ∂(n)=%4t+1,ac%* a(t,vq1,qh)∂(m){eqn peq6!}, {eqn vect!}
.xgenlines←xgenlines-1

!eqncaseiii5 ∂(n-2)qzq3 ∂(n)= %4t+1%2 a(ac,vq2,a(t,vq1,qh))∂(m){eqn induct2!}, {eqn vect!}, {eqn peq5!}, {eqn caseiii3!}

!eqncaseiii6 ∂(n-6)c(ac,qzq3) ∂(n)= vq2  %1and%* c(t,qzq3) = vq1 ∂(m){eqn vect!}, {eqn caseiii5!}

∂(n-2)qzq4∂(n)= outcome(mkadd(t),qzq3) ∂(m) {eqn statedef!}
∂(n)= a(ac,c(t,qzq3) + c(ac,qzq3),qzq3)∂(m){eqn outcome!}, {eqn isadd!}, {eqn step!}
∂(n)= a(ac,v,qzq3)∂(m) {eqn vdef!}, {eqn caseiii6!}, {eqn value!}
∂(n)=%4t+1%* a(ac,v,a(ac,vq2,a(t,vq1,qh)))∂(m){eqn peq5!}, {eqn caseiii5!}
∂(n)=%4t+1%* a(ac,v,a(t,vq1,qh))∂(m){eqn vect!}
∂(n)=%4t%* a(ac,v,qh).∂(m){eqn vect!}, {eqn peq6!}, {eqn peq7!}
.end
.xgenlines←xgenlines-1

This concludes the proof.

.ss(remarks,Remarks.)

	The definition of correctness, the formalism used to express the 
description of
source language, object language and compiler, and the methods of proof are all
intended to serve as prototypes for the more complicated task of proving the
correctness of usable compilers.  The ultimate goal, as outlined in
McCarthy [1962a,1962b,1963,1964] is to make it possible to use a computer to
check proofs that compilers are correct.  
The concepts of abstract syntax, state vector, the use of an interpreter for
defining the semantics of a programming language, and the definition of
correctness of a compiler were introduced in [McCarthy 1962b].  
[McCarthy and Painter 1967],
however, was the first in which the correctness of a compiler was proved.

	The problem of the relations between source language and object language
arithmetic is dealt with here by assuming that the + signs in the definitions
of ⊗value {eqn value} and ⊗step {eqn step} which define the semantics 
of the source and object
languages represent the same operation.  Theorem 1 does not depend on any
properties of this operation, not even commutativity or associativity.

	The proof is entirely straightforward once the necessary machinery has 
been created.  Additional operations such as subtraction, multiplication and
division could be added without essential change in the proof.
For example, to put multiplication into the system the following changes would
be required.

.item←0

#.  Use the analytic syntax, semantics, and induction principle given in
qsect{subsection abstract} for the language including products.

#.  Add an instruction ⊗mul ⊗x and the three syntactical functions ⊗ismul(s), 
⊗adr(s), ⊗mkmul(x) to the abstract syntax of the object language together
with the necessary relations among them.

#.  Add to the definition {eqn step} of ⊗step the clause

⊗⊗⊗qelse qif ismul(s) qthen a(ac,c(adr(s),qh) %8#%* c(ac,qh),qh)⊗.  

#.  Add to the compiler {eqn compile} the clause

⊗⊗⊗qelse qif isprod(e) qthen compile(p1(e),t) * mksto(t)
* compile(p2(e),t + 1) * mkmul(t)⊗.

#.  Add to the proof a case ⊗isprod(e) which parallels the case ⊗issum(e) exactly.


Many extensions of this compiling problem are treated in Painter [1967].


.skip 2
.if lines < 5 then next page;
.cb Exercise
.item←0

#. Modify the arithmetic expression language to allow variable length sums. 
 Describe how to modify the abstract syntax and how to fix up the interpreter and
compiler, in order to give the semantics of the new language an compile it.
Are any modifications of the object language syntax or semantics needed.
How can the proof of correctness of the original compiler be turned into
a proof of correctness of this new compiler?

#. What problems are encountered when conditional expressions are added to
the source language?  In what way is the current object language inadequate
for this case?  How would you solve this problem?
.ss(absntxex, Some substantial exercises.)
.item←0

	The following are some problems in programming and proving that
can be solved using the techniques described in this chapter.

#.   Consider the class of boolean conditional expressions (bces).
A bce is either a literal (⊗lit) or  conditional (⊗if)  composed from a triple
of bces with component parts ⊗premise, ⊗consequent and ⊗alternate.  
Make a table giving the abstract syntax of bces.  
Define an evaluator ⊗bval for bces given that ⊗lval determines the truth 
value of literals.  Your evaluator should satisfy the usual rules for
conditional expressions as given in Chapter_{SECTION PROVIN} 
({eqn cond1!!})_-_({eqn cond7!!}).

	Work out the relations between the analytic and synthetic
syntaxes, and give an induction principle for bces.

	We say that a bce is in normal form if
the premise of every subexpression 
is a literal.  Define a function that converts a bce to normal form.
.begin nofill

[Hint: Observe that in our external language the equivalence
⊗⊗⊗qif p qthen [qif q qthen s qelse t] qelse [qif r qthen s qelse t] 
≡ qif [qif p qthen q qelse r] qthen s qelse t⊗
holds.]
.end
Prove that your function is total and that the result is in normal form.
Proving totality will take some thought to get the correct induction
predicate.  Proving correctness, e.g. that the result is normal, will
require making a formal statement of the notion of normality.  This
can be done by defining a predicate ⊗normal on bces.
Show that the normalizing transformation is value preserving.

#. Write the syntax for a language in which programs are sequences
of statements (possibly labeled), where a statement is either an
assignment of a variable
to the value of an expression, a conditional branch where the
only tests are ⊗e=0 and ⊗e>0, or a return statement which returns
the value of an expression.  For simplicity assume expressions
are of the form described in qsect{subsection sourcelang}.  
Extend the machine language of qsect{subsection objectlang}
to allow conditional branches.  Now write interpreters and a
compiler for your new language and machine.  Outline a proof of
correctness.  What new difficulties are encountered when branches
are allowed in the source language?